Skip to content

feat: Governance changes#118

Open
viswa-uipath wants to merge 4 commits into
mainfrom
feat/governance-core
Open

feat: Governance changes#118
viswa-uipath wants to merge 4 commits into
mainfrom
feat/governance-core

Conversation

@viswa-uipath

@viswa-uipath viswa-uipath commented Jun 9, 2026

Copy link
Copy Markdown

TEST IN PROGRESS: PR that includes runtime changes for governance

Development Package

  • Add this package as a dependency in your pyproject.toml:
[project]
dependencies = [
  # Exact version:
  "uipath-runtime==0.11.0.dev1001180443",

  # Any version from PR
  "uipath-runtime>=0.11.0.dev1001180000,<0.11.0.dev1001190000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath-runtime = { index = "testpypi" }

viswa-uipath and others added 2 commits June 9, 2026 17:12
…d adapter wiring

Adds the runtime-side governance subsystem behind the
``EnablePythonGovernanceChecker`` feature flag. When the flag is off,
none of this is imported — the gate is at
``src/uipath/runtime/wrapper.py:apply_governance_wrapper`` and the
governance subtree stays off the startup path.

Architecture:
- ``src/uipath/runtime/governance/wrapper.py:GovernanceRuntime`` —
  proxies the wrapped runtime, fires BEFORE_AGENT / AFTER_AGENT at the
  runtime boundary, materialises the evaluator + framework adapter
  lazily on the first hook fire. Step-isolated dispose; init
  side-effects (model-name ContextVar, agent-type selector, prefetch)
  all live behind the FF gate so an FF-off path is a true no-op.
- ``src/uipath/runtime/governance/native/`` — in-process policy
  evaluator: policy fetch (size-bounded, currency-anchored amount
  detection, expanded verb pattern for proposal/SOW commitments),
  YAML-to-index compiler, bounded-pool compensating /runtime/govern
  call, agent-type query param (conversational vs autonomous),
  job-context payload (folder/job/process/agent/version keys).
- ``src/uipath/runtime/governance/audit/`` — pluggable sink framework
  with a background-thread queue. Default sinks: traces (OTel spans,
  always on, platform-mandated) and console (stderr, opt-in via
  ``UIPATH_GOVERNANCE_CONSOLE_LOG``). Sink failures circuit-break
  after 10 consecutive errors; counters reset on re-register so a
  fresh instance doesn't inherit a tripped state. ``close()`` shutdown
  is bounded — ``put_nowait`` sentinel + ``_shutdown.set()`` signal so
  a wedged sink can't hang process exit.
- ``src/uipath/runtime/governance/delegation_guard.py`` — async-aware
  depth guard, patches both ``invoke`` and ``ainvoke`` with sync/async
  wrappers matched via ``iscoroutinefunction``. Per-agent depths live
  in a single module-level ``ContextVar[dict[id(agent), int]]`` —
  ContextVars are interned by the interpreter and never GC'd, so the
  prior one-ContextVar-per-agent design was an unbounded leak.
- ``src/uipath/runtime/governance/audit/traces.py`` — rule-level OTel
  span surfaces matched non-allow actions as ``Status.ERROR``
  (including audit-mode violations the runtime intentionally didn't
  block). Hook spans stay UNSET; severity belongs on the rule that
  fired.
- ``src/uipath/runtime/registry.py`` — ``UiPathWrappedRuntimeFactory``
  wraps every registered factory so every runtime it produces passes
  through ``apply_governance_wrapper``.

Notable contracts:
- ``_extract_governable_text`` (wrapper.py): pulls clean content out
  of arbitrary runtime payloads. Walks dicts (priority keys: content /
  text / output / answer / message / result / arguments / thinking),
  list-of-blocks, pydantic models, dataclasses, plain objects. Cycle-
  safe, depth-capped, 8000-char budget. Replaces the prior
  ``str(value)[:2000]`` shortcut that produced dict-repr garble.
- ``commitment_concern`` (``A.10.4``): OR semantics with currency-
  anchored amount detection. Verb pattern covers first-person promise
  verbs and proposal/SOW markers ("Cost: $X", "fixed scope",
  "Deliverables", "Timeline: N days", "I propose"). Bare percentages
  intentionally not matched — they false-positive on status text.
- Compensation payload: ``FiredRule`` TypedDict carries per-rule
  metadata for LLMOps trace records; the validators list is derived
  from it.
- Job-context resolution: memoized once per process via
  ``functools.lru_cache``; tests can invalidate via ``cache_clear``.
- Process-level governance state (conversational selector,
  job-context cache) reset between tests via an autouse fixture in
  ``tests/conftest.py``.

Tests (225 passing):
- ``tests/test_evaluator.py`` — core evaluator + wrapper / adapter
  integration via captured audit events.
- ``tests/test_commitment_concern.py`` — verb/amount/deadline OR
  semantics, the proposal-style sample that originally slipped past
  the rule, URL-fragment digits don't false-positive, percentage-only
  status text stays silent.
- ``tests/test_delegation_guard.py`` — sync + async wrapper shapes,
  shared depth counter across modes, leak fix (100 install/uninstall
  cycles keep one ContextVar), multi-agent isolation.
- ``tests/test_dispose_isolation.py`` — each governance-side dispose
  step survives upstream failures; delegate dispose still propagates.
- ``tests/test_text_extraction.py`` — dict / list-of-blocks /
  pydantic / dataclass / cycle / budget cap.
- ``tests/test_audit_register_sink.py`` — sink failure counter reset
  on re-register, duplicate register is a no-op, full lifecycle.
- ``tests/test_traces_severity.py`` — rule span ERROR for matched
  non-allow, hook span stays UNSET regardless of final_action.
- ``tests/test_guardrail_compensation.py`` — compensating /govern call
  payload, headers, URL composition, evaluator integration.
- ``tests/test_policy_agent_type.py`` — conversational vs autonomous
  selector, policy URL query param.
- ``tests/test_registry.py`` — factory wrapping, governance attached
  to every registered factory.
- ``tests/test_wrapper.py`` — FF gate + lazy-import contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 9, 2026 11:57
@viswa-uipath viswa-uipath requested a review from a team as a code owner June 9, 2026 11:57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a feature-flag–gated governance layer in uipath-runtime, wiring governance into runtime creation and execution while keeping governance imports and backend calls off the hot path when disabled. It also adds a native policy-fetch/compile pipeline, OpenTelemetry-based auditing, delegation-depth protection, and compensating /runtime/govern calls for disabled centralized guardrails, with comprehensive new test coverage and supporting documentation.

Changes:

  • Added an FF-gated runtime wrapper entrypoint (apply_governance_wrapper) and default factory wrapping (UiPathWrappedRuntimeFactory) to apply governance automatically.
  • Implemented native governance backend integration (policy fetch + YAML→index compilation), compensation calls, and supporting runtime state (enforcement mode, agent-type selector).
  • Added audit sink framework (mandatory traces sink + optional console), improved delegation guard (sync+async), and extensive tests/docs/dependency updates.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/test_wrapper.py Tests FF-gated lazy import + fail-open behavior for wrapper application.
tests/test_traces_severity.py Validates OTel span status semantics for hook vs rule spans.
tests/test_text_extraction.py Tests structured payload text extraction for governance scanning.
tests/test_registry.py Updates registry tests for wrapper application and adds wrapper-specific coverage.
tests/test_policy_agent_type.py Tests conversational/autonomous agent-type selector and policy URL param behavior.
tests/test_guardrail_compensation.py Tests compensation payload/headers/URL composition, error swallowing, and evaluator integration.
tests/test_evaluator.py Tests enforcement-mode semantics, audit emission, and sink-failure isolation.
tests/test_dispose_isolation.py Tests step-isolated cleanup semantics in GovernanceRuntime.dispose().
tests/test_delegation_guard.py Tests async-aware delegation depth guard, idempotency, and leak prevention.
tests/test_commitment_concern.py Tests updated commitment concern detector semantics and regressions.
tests/test_audit_register_sink.py Tests audit sink circuit-breaker counter reset behavior on register/unregister.
tests/conftest.py Adds autouse fixture to reset governance process-level state between tests.
src/uipath/runtime/wrapper.py Adds FF-gated apply_governance_wrapper with lazy import + fail-open behavior.
src/uipath/runtime/registry.py Wraps factories by default to apply runtime wrappers; adds apply_wrappers escape hatch.
src/uipath/runtime/governance/wrapper.py Adds GovernanceRuntime, text extraction, adapter attachment, and runtime-boundary checks.
src/uipath/runtime/governance/native/policy_api_client.py Implements policy URL building and single-shot policy fetch + parsing.
src/uipath/runtime/governance/native/models.py Adds native policy model types (Rule, Check, Condition, PolicyIndex, etc.).
src/uipath/runtime/governance/native/loader.py Adds cached policy loading with background prefetch and fail-open behavior.
src/uipath/runtime/governance/native/guardrail_compensation.py Adds bounded background compensation pool and /runtime/govern POST logic.
src/uipath/runtime/governance/native/backend_client.py Centralizes URL composition, headers, org/tenant/job context resolution, and tunables.
src/uipath/runtime/governance/native/_yaml_to_index.py Parses backend YAML into native PolicyIndex (skip-malformed, partial-pack tolerant).
src/uipath/runtime/governance/native/init.py Exposes native governance evaluator/loader/model APIs.
src/uipath/runtime/governance/delegation_guard.py Implements shared-ContextVar delegation depth guard for sync+async entrypoints.
src/uipath/runtime/governance/config.py Adds runtime-level cached enforcement mode state/config.
src/uipath/runtime/governance/audit/traces.py Adds OTel traces sink and span attribute/status semantics.
src/uipath/runtime/governance/audit/factory.py Adds sink factory (traces/console).
src/uipath/runtime/governance/audit/console.py Adds optional console sink output formatting/filtering.
src/uipath/runtime/governance/audit/base.py Adds audit event model + async audit manager with bounded queue and circuit-breaker.
src/uipath/runtime/governance/audit/init.py Exposes audit framework public API.
src/uipath/runtime/init.py Re-exports governance integration entrypoints from runtime package.
pyproject.toml Bumps uipath-core floor and adds native governance deps + typing overrides.
docs/runtime-wrapper-extension.md Documents the governance integration point, FF gating, and testing approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +23
Failure mode is fail-open: when the organization id is unknown, the
access token is missing, the backend errors (one retry on transient
failures), or the body can't be parsed, the caller falls back to an
empty PolicyIndex. Nothing in this module ever raises to the caller.
Comment on lines +150 to +152
headers = governance_request_headers(json_body=True)
headers[TENANT_HEADER] = tenant_id
logger.info("Policy fetch starting (org=%s, tenant=%s)", org_id, tenant_id)
Comment on lines +130 to +138
for dumper in ("model_dump", "dict"):
fn = getattr(value, dumper, None)
if callable(fn):
try:
return _extract_governable_text(
fn(), budget=budget, seen=seen, depth=depth + 1,
)
except Exception: # noqa: BLE001 - fall through to other extractors
break
Comment on lines +140 to +151
event = _prefetch_event
if event is not None:
completed = event.wait(timeout=_PREFETCH_WAIT_SECONDS)
if completed and _policy_index is not None:
return _policy_index
logger.warning(
"Policy prefetch did not complete in %.1fs; "
"agent will run without any policies",
_PREFETCH_WAIT_SECONDS,
)
_policy_index = PolicyIndex()
return _policy_index
Comment on lines +135 to +139
if cond.operator == "guardrail_fallback" and isinstance(
cond.value, dict
):
validator = str(cond.value.get("validator", ""))
if validator:
Comment on lines +329 to +332
url = build_governance_url(org_id, GOVERN_API_PATH)
headers = governance_request_headers(json_body=True)
headers[TENANT_HEADER] = tenant_id

Comment on lines +541 to +546
if self._async_mode:
# Wait for queue to drain
try:
self._queue.join()
except Exception:
pass
viswa-uipath and others added 2 commits June 10, 2026 15:33
…tiate audit vs enforce severity in traces

Text extraction (wrapper.py):
- Add "messages" to priority content keys for LangGraph-style state
  ({"messages": [...]}) so chat history leads the extracted blob.
- Walk lists newest-first so the latest message wins the budget when
  the conversation grows.
- New latest_only flag (passed by BEFORE_AGENT) reduces the chat
  history to the most recent message; flag resets on recursion so
  multi-block content within that message is still walked fully.
- Raise text cap 8K -> 64K to fit multi-turn chat.

Trace severity (audit/traces.py):
- Differentiate "actually blocked" from "advisory" violations:
  enforce-mode deny/escalate -> severity=ERROR + StatusCode.ERROR;
  audit-mode (any action) or enforce-mode audit-action -> severity=
  WARNING, Status left UNSET so the agent span isn't falsely marked
  failed.

Tests cover reverse list walk, latest_only semantics, the 64K cap,
and the audit/enforce severity matrix.
Addresses all 7 Copilot review comments on PR #118 and switches the
default enforcement mode so empty-policy tenants pay zero per-call
audit overhead.

PR-118 review comments:
- policy_api_client docstring no longer claims "one retry on transient
  failures" — _get_once is and remains single-shot by design.
- Policy fetch GET drops Content-Type: application/json (was sent via
  json_body=True). Strict origin servers can 415 on unexpected
  Content-Type for GETs; the helper's own docstring recommends
  omitting it on reads.
- _extract_governable_text dumper loop now CONTINUES instead of BREAKS
  when model_dump() raises, so dict() is tried as documented ("fall
  through to other extractors").
- loader.get_policy_index distinguishes "prefetch did not complete in
  Xs" from "prefetch completed but produced no PolicyIndex" — prod
  triage can now tell a hung fetch from an auth / parse failure.
- disabled_guardrails defensively re-checks mapped_to_uipath=True AND
  policy_enabled=False on every guardrail_fallback condition. Matches
  the function's docstring and protects against multi-condition rules
  or any future code path that bypasses the evaluator gate.
- request_governance pre-checks UIPATH_ACCESS_TOKEN and skips when
  missing. Sending without a bearer guarantees a 401 per compensation
  call and pollutes logs; mirrors the org-id / tenant-id skip pattern
  already in place.
- AuditManager.flush(timeout=...) now honors its timeout via a
  time.monotonic() poll loop and warns if drain doesn't complete.
  Previously called queue.Queue.join() with no timeout argument,
  allowing indefinite block — risky at process exit where
  _cleanup_audit_manager supplies a 2-second timeout that was being
  silently ignored.

Default enforcement mode:
- get_enforcement_mode default fallback flipped from AUDIT to
  DISABLED. The server-supplied mode (applied by the policy loader on
  every successful fetch) still wins; the env-var override still
  works. Empty-policy / failed-fetch / pre-fetch tenants now
  short-circuit at evaluator.py:332 with no _emit_audit call, no OTel
  spans, no AuditManager queue traffic. Previously these scenarios
  silently fell through to AUDIT and produced ~40 empty governance
  spans per turn for an N=10 LLM-call agent.

Tests (245 passing, +7 new):
- test_enforcement_mode_default.py pins the resolution order
  (programmatic > env > DISABLED default) and the
  invalid-env-falls-back-to-DISABLED behavior.
- test_request_governance_skipped_when_token_missing pins the new
  bearer-token skip path.
- _govern_env fixture now sets UIPATH_ACCESS_TOKEN; the headers test
  asserts the Authorization header is present (was a side-effect of
  the no-token test, which is now moved out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
59.6% Coverage on New Code (required ≥ 90%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants